Uniqueness, Urn Models and Disclosure Risk
نویسنده
چکیده
The prevalence of categorical observations that are unique in a sample and also still unique in the population are usually taken as the measure of the overall risk of disclosure in the sample data. Samuels (1998) suggested adopting evolutionary processes and their associated urn models as a framework for estimating this prevalence. We re-examine his proposal and suggest several extensions that arise naturally in the Bayesian statistical framework. We provide a brief report on some empirical studies using data provided by the Israel Central Bureau of Statistics. We also link this approach to ones based on the structure of cross-classifications allowing for differential, per-unit forms of risk assessment.
منابع مشابه
Disclosure Risk Measurement with Entropy in Two-Dimensional Sample Based Frequency Tables
We extend a disclosure risk measure defined for population based frequency tables to sample based frequency tables. The disclosure risk measure is based on information theoretical expressions, such as entropy and conditional entropy, that reflect the properties of attribute disclosure. To estimate the disclosure risk of a sample based frequency table we need to take into account the underlying ...
متن کاملP´olya Urn Models and Connections to Random Trees: A Review
This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...
متن کاملP´olya-Type Urn Models with Multiple Drawings
We investigate the distribution, mean value, variance and some limiting properties of an urn model of white and red balls under random multiple drawing (either with or without replacement) when the number of white and red balls added follows a schedule that depends on the number of white balls chosen in each drawing.
متن کاملEstimating Identification Disclosure Risk Using Mixed Membership Models.
Statistical agencies and other organizations that disseminate data are obligated to protect data subjects' confidentiality. For example, ill-intentioned individuals might link data subjects to records in other databases by matching on common characteristics (keys). Successful links are particularly problematic for data subjects with combinations of keys that are unique in the population. Hence,...
متن کاملDisclosure Risk and Sample of Anonymized Records
The disclosure problem relates to the possibility of identifying individuals in the released statistical information . The paper evaluates the disclosure risk on a 3% sample of individual data from the Slovene 1991 Population Census . The concept of uniqueness is used for this purpose . The level of regional aggregation, the number of identifying variables and the grouping of the categories are...
متن کامل